1 Introduction

In this project, we aim to understand the effect of time until cyropreservation (room temperature or 4ºC) on single-cell transcriptional profiles of chronic lymphocytic leukemia (CLL) cells.

1.0.1 Description of the data

To that end, we drew blood from 3 CLL patients (ids: 1220, 1472, 1892) and cyropreserved the samples after 0h (fresh), 2h, 4h, 6h, 8h and 24h at either room temperature (RT) or 4ºC (4C). To eliminate batch effects, detect doublets and reduce the library cost, we performed cell hashing. In this protocol, each condition (in our case time-points) is labeled with a specific hashtag oligonucleotide (HTO) that is crosslinked with an antibody. The antibodies bind to ubiquitous cell surface markers, and the HTO are sequenced alongside the single-cell gene expression libraries. We have the following samples:

  • 1220_RT (note that we do not have 4ºC for this donor)
  • 1472_4C
  • 1472_RT
  • 1892_4C
  • 1892_RT

For each of them, we have 3 files: the expression matrix in sparse format, the list of the barcodes that identify the columns, and the list of genes that identify the rows (features). Moreover, the features file contains a column that distinguishes between genes (“Gene Expression”) and HTO (“Antibody Capture”).

1.0.2 Objective

The objective of this notebook is to demultiplex the barcodes (cells) back to its original time-point. To achive that, we will follow the pipeline from Seurat:

2 Pre-processing

2.1 Load packages

library(scater)
library(SingleCellExperiment)
library(Seurat)
library(Matrix)
library(tidyverse)

3 Demultiplex

# Load expression matrix, gene and cell metadata
libraries <- c("1472_RT", "1472_4C", "1892_RT", "1892_4C", "1220_RT")
cll_list <- list()

for (lib in libraries) {
  lib_path <- str_c("data/BCLLATLAS_03/", lib, "/filtered_feature_bc_matrix/")
  expression_matrix <- readMM(str_c(lib_path, "matrix.mtx.gz"))
  barcodes <- read_csv(str_c(lib_path, "barcodes.tsv.gz"), col_names = FALSE)
  colnames(barcodes) <- "barcode"
  features <- read_tsv(str_c(lib_path, "features.tsv.gz"), col_names = FALSE)
  colnames(features) <- c("ensembl", "symbol", "feature_type")
  rownames(expression_matrix) <- features$symbol
  colnames(expression_matrix) <- barcodes$barcode
  
  # Separate HTO and RNA matrices
  hto_ind <- which(str_detect(features$feature_type, "Antibody Capture"))
  rna_ind <- which(str_detect(features$feature_type, "Gene Expression"))
  cll_hto <- expression_matrix[hto_ind, ]
  cll_rna <- expression_matrix[rna_ind, ]
  
  # Setup Seurat object
  cll <- CreateSeuratObject(counts = cll_rna)
  
  # Normalize RNA data with log normalization
  cll <- NormalizeData(cll)
  
  # Find and scale variable features
  cll <- FindVariableFeatures(cll, selection.method = "vst")
  cll <- ScaleData(cll, features = VariableFeatures(cll))
  
  # Add HTO as an independent assay
  cll[["HTO"]] <- CreateAssayObject(counts = cll_hto)
  cll <- NormalizeData(cll, assay = "HTO", normalization.method = "CLR")
  
  # Demultiplex
  cll <- HTODemux(cll, assay = "HTO", positive.quantile = 0.99)
  
  # Append to list of Seurat objects
  cll_list[[lib]] <- cll
}

4 Visualize

We can visualize the results as ridge plots or heatmaps:

ridge_l <- map(cll_list, function(cll) {
  Idents(cll) <- "HTO_maxID"
  RidgePlot(
    cll,
    assay = "HTO",
    features = rownames(cll[["HTO"]])[1:6],
    ncol = 2
  )
})

heatmap_l <- map(cll_list, function(cll) {
  HTOHeatmap(cll, assay = "HTO", ncells = 5000)
})

# # Save
# walk2(ridge_l, names(ridge_l), function(ridge, lib) {
#   ggsave(
#     filename = str_c("results/plots/", lib, "_hashtag_demux_ridge.pdf"),
#     plot = ridge, height = 9, 
#     width = 16
#   )
# })
# walk2(heatmap_l, names(heatmap_l), function(heat, lib) {
#   ggsave(
#     filename = str_c("results/plots/", lib, "_hashtag_demux_heatmap.png"), 
#     plot = heat,
#     height = 4, 
#     width = 7
#   )
# })
ridge_l
## $`1472_RT`

## 
## $`1472_4C`

## 
## $`1892_RT`

## 
## $`1892_4C`

## 
## $`1220_RT`

heatmap_l
## $`1472_RT`

## 
## $`1472_4C`

## 
## $`1892_RT`

## 
## $`1892_4C`

## 
## $`1220_RT`

As we can see, there is a high signal-to-noise ratio for every HTO: the samples are easily identifible and there is no cross-contamination between hashtags. Overall, the number of cells is evenly distributed across time-points, which makes it easy to detect heterotypic doublets (doublets from different time-points). However, for the donor 1892, the cells assigned to 0h involve a larger fraction of the total, so we can expect an increased proportion of undetectable 0h-0h doublets.

Finally it is useful to get an overview of the number of singlets, doublets and negative cells per library:

hto_levels <- c("Negative", "Singlet", "Doublet")
cll_gg <- map(names(cll_list), function(id) {
  cll_list[[id]]@meta.data %>% 
    group_by(HTO_classification.global) %>% 
    summarise(count = n()) %>% 
    mutate(HTO_classification.global = factor(
      HTO_classification.global, 
      levels = hto_levels
    )) %>% 
    ggplot(aes(HTO_classification.global, count)) +
      geom_col() +
      geom_text(aes(label = count), 
                position = position_dodge(width = 0.9), vjust = -0.25) +
      labs(title = str_to_title(id), x = "", y = "number of cells") +
      theme_bw() +
      theme(axis.text.x = element_text(size = 11), 
            plot.title = element_text(hjust = 0.5))
})
cll_gg
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

There is a low proportion of doublets and negative cells. Thus, we can conclude that we were able to successfully demultiplex cells and obtain high-quality data.

5 Save demultiplexed Seurat object

To merge the 4 Seurat objects into one we use the merge command:

cll_merged <- merge(
  cll_list$`1472_RT`, 
  y = c(cll_list$`1472_4C`, cll_list$`1892_RT`, cll_list$`1892_4C`, cll_list$`1220_RT`),
  add.cell.ids = libraries, 
  project = "CLL_benchmarking"
)

# Recode and retain important variables
cll_merged$donor <- str_remove(colnames(cll_merged), "_.*$")
cll_merged$time <- str_remove(cll_merged$hash.ID, "....-..-")
cll_merged$temperature <- cll_merged$hash.ID %>% 
  str_remove("^....-") %>% 
  str_remove("-.+h$")
selection <- c("nCount_RNA", "nFeature_RNA", "HTO_classification", "hash.ID", 
               "time", "donor", "temperature")
cll_merged@meta.data <- cll_merged@meta.data[, selection]

Finally, we can save it as .RDS for future analysis:

# saveRDS(cll_merged, "results/R_objects/cll_Seurat_demultiplexed.rds")

6 Session Info

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.0               stringr_1.4.0               dplyr_0.8.4                 purrr_0.3.3                 readr_1.3.1                 tidyr_1.0.2                 tibble_2.1.3                tidyverse_1.3.0             Matrix_1.2-18               Seurat_3.1.4                scater_1.14.6               ggplot2_3.3.0               SingleCellExperiment_1.8.0  SummarizedExperiment_1.16.1 DelayedArray_0.12.2         BiocParallel_1.20.1         matrixStats_0.55.0          Biobase_2.46.0              GenomicRanges_1.38.0        GenomeInfoDb_1.22.0         IRanges_2.20.2              S4Vectors_0.24.3            BiocGenerics_0.32.0         BiocStyle_2.14.4           
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1             backports_1.1.5          sn_1.5-5                 plyr_1.8.6               igraph_1.2.4.2           lazyeval_0.2.2           splines_3.6.1            listenv_0.8.0            TH.data_1.0-10           digest_0.6.25            htmltools_0.4.0          magick_2.3               viridis_0.5.1            fansi_0.4.1              gdata_2.18.0             magrittr_1.5             cluster_2.1.0            ROCR_1.0-7               globals_0.12.5           modelr_0.1.6             RcppParallel_4.4.4       sandwich_2.5-1           colorspace_1.4-1         rvest_0.3.5              ggrepel_0.8.1            haven_2.2.0              xfun_0.12                crayon_1.3.4             RCurl_1.98-1.1           jsonlite_1.6.1           survival_3.1-8           zoo_1.8-7                ape_5.3                  glue_1.3.1               gtable_0.3.0             zlibbioc_1.32.0          XVector_0.26.0           leiden_0.3.3             BiocSingular_1.2.2       future.apply_1.4.0       scales_1.1.0             mvtnorm_1.1-0            DBI_1.1.0                bibtex_0.4.2.2           Rcpp_1.0.3               metap_1.3                plotrix_3.7-7           
##  [48] viridisLite_0.3.0        reticulate_1.14          rsvd_1.0.3               tsne_0.1-3               htmlwidgets_1.5.1        httr_1.4.1               gplots_3.0.3             RColorBrewer_1.1-2       TFisher_0.2.0            ica_1.0-2                farver_2.0.3             pkgconfig_2.0.3          uwot_0.1.5               dbplyr_1.4.2             labeling_0.3             tidyselect_1.0.0         rlang_0.4.5              reshape2_1.4.3           cellranger_1.1.0         munsell_0.5.0            tools_3.6.1              cli_2.0.2                generics_0.0.2           broom_0.5.5              ggridges_0.5.2           evaluate_0.14            yaml_2.2.1               npsurv_0.4-0             fs_1.3.2                 knitr_1.28               fitdistrplus_1.0-14      caTools_1.18.0           RANN_2.6.1               pbapply_1.4-2            future_1.16.0            nlme_3.1-145             xml2_1.2.2               rstudioapi_0.11          compiler_3.6.1           beeswarm_0.2.3           plotly_4.9.2             png_0.1-7                lsei_1.2-0               reprex_0.3.0             stringi_1.4.6            lattice_0.20-40          multtest_2.42.0         
##  [95] vctrs_0.2.3              mutoss_0.1-12            pillar_1.4.3             lifecycle_0.1.0          BiocManager_1.30.10      Rdpack_0.11-1            lmtest_0.9-37            RcppAnnoy_0.0.15         BiocNeighbors_1.4.2      data.table_1.12.8        cowplot_1.0.0            bitops_1.0-6             irlba_2.3.3              gbRd_0.4-11              patchwork_1.0.0          R6_2.4.1                 bookdown_0.18            KernSmooth_2.23-16       gridExtra_2.3            vipor_0.4.5              codetools_0.2-16         MASS_7.3-51.5            gtools_3.8.1             assertthat_0.2.1         withr_2.1.2              sctransform_0.2.1        mnormt_1.5-6             multcomp_1.4-12          GenomeInfoDbData_1.2.2   hms_0.5.3                grid_3.6.1               rmarkdown_2.1            DelayedMatrixStats_1.8.0 Rtsne_0.15               lubridate_1.7.4          numDeriv_2016.8-1.1      ggbeeswarm_0.6.0